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Abstract — Data downloading on the fly is the base of com- 
mercial data services in vehicular networks, such as office-on- 
wheels and entertainment-on-wheels. Due to the sparse spacial 
distribution of roadside Base Stations (BS) along the road, 
downloading through Roadside-to- Vehicle (R2V) connections is 
intermittent. When multiple vehicles with geographical proximity 
have common interest in certain objects to download, they can 
collaborate to reduce significantly their overall download time. 
In this paper, we investigate application of Network Coding (NC) 
in collaborative downloading (CD). We focus on the R2V part of 
CD, and analytically derive probability distribution and expected 
value of amount of time needed to deliver all information to the 
vehicles with and without NC. Our results show that using NC 
slightly improves the downloading time in addition to removing 
any need for having any sort of uplink communications from 
vehicles to the infrastructure. 

I. Introduction 

Recent development and standardization in vehicular ad hoc 
networks (VANETs) [l] have motivated increasing interest in 
data services for in-vehicle consumption, such as 'commerce- 
and entertainment-on-the-wheel' J2j- Hence, in near future, 
the number of vehicles equipped with wireless communication 
devices is poised to increase dramatically, i.e., we are moving 
closer to an intelligent transportation system. Such system 
would provide a wide variety of applications: local information 
pushed to vehicles (e.g., traffic notification, map updates, 
location based advertisements); or specific data pulled from 
Internet servers (e.g., neighborhood parking, reviews of local 
restaurants, and video clips of local attractions) 0. 

In current intelligent transportation system, services are 
provided to vehicles using existing wide-area cellular in- 
frastructure (3G/4G) and/or the roadside infrastructure based 
on Dedicated Short Range Communications (DSRC) link^] 
proposed for VANET networks [2]. However, both of these 
approaches have their own challenges: the modest data rate 
of present 3G links and its high cost of data downloading on 
one hand, and the intermittent hotspot type roadside coverage 
envisaged with DSRC on the other 0. 

The above challenges lead to the following simple premise 
for content dissemination, called collaborative downloading: 

'DSRC in North America is based on 802. lip wireless link access in 
5.9 GHz band, subsequently folded into the IEEE 1609 Wireless Access for 
Vehicular Environments (WAVE) standards. 



if the content is available at a subset of vehicles that is 
also desired by (many) others in the network, peer-to-peer 
content distribution using vehicle-to-vehicle (V2V) ad hoc 
communications, is time and cost efficient. 

Collaborative downloading is a data dissemination protocol, 
distributing information among all nodes inside the network 
0, 0, and it has attracted a lot of attention in VANET 
community in the past few years |7||8|. In general, data 
dissemination in VANET networks consists of two phases 
0GO]: 

1) Roadside-to- Vehicle (R2V) phase: in this phase vehicles 
are communicating with a base station (located in spe- 
cific location on the road) to receive data. 

2) Vehicle-to- Vehicle (V2V) phase: when the vehicles are 
out of the coverage of the BS, they try to exchange in- 
formation between each other. If this phase is completed 
all the nodes have the same data. 

To better understand the concept of collaborative download- 
ing, consider the scenario in Figure [TJ in which a group of 
vehicles equipped with a DSRC radio are connected to the 
Internet via roadside DSRC base stations. Assume that these 
vehicles have common interest in a large file on the Internet. 
The intermittency arises from the sparsity of the roadside 
DSRC base-stations and the hotspot nature of coverage. As 
a result, data is downloaded during the (short) intervals of 
radio connectivity from the infrastructure to the vehicle. Thus 
each vehicle in a platoon only obtains a fraction of the overall 
file in each contact duration. In principle, each vehicle could 
download the whole file through multiple contacts, requiring 
significant latency due the sparsity of the contacts. A more 
effective alternative is for the vehicles to form a coalition for 
sharing their respective pieces after each round of contacts 
(collaborative downloading) when they are out of coverage 
(V2V phase). 

Our focus in this paper is on R2V phase. We assume a 
perfect V2V transmission, i.e., data dissemination is complete 
after V2V phase. In other words, we assume that the vehicles 
have the same information after V2V phase is over. 

In the state-of-the-art algorithm for collaborative download- 
ing, nodes need to communicate with BS to let it know which 
packets they are interested in. That means, a partial time of 
vehicle to BS connection (in R2V phase) has to be assigned to 




Fig. 1 . System diagram of collaborative downloading among vehicles on the 
road. 

signalling from the vehicles to the Moreover, this scheme 
requires a lot of synchronization and handshaking between the 
BS and the receiving BS. In this paper, we propose to use 
network coding in downloading data from the infrastructure. 
Our algorithm omits any kind of signaling from vehicles to 
the BS. Moreover, we analytically show that expected time 
needed to disseminate data from infrastructure to all vehicles 
is slightly less by using NC. In other words, NC slightly 
decreases the downloading delay in addition to removing any 
need for uplink communication. 

The rest of the paper is organized as follows: In section 
Ull we briefly review application of NC in wireless net- 
works. Section |lll] present our system model for collaborative 
downloading. We derive dissemination delay when NC is not 
exploited in Section |IV] Average amount of time needed to 
deliver information to all nodes in the network with use of 
NC is derived in Section [V] Our evaluation result is given 
in Section [Vl] The paper concludes with reflections on future 
works in Section IVIII The proofs of theorems is given in the 
Appendix. 

Notations: We use bold capitals (e.g. A) to represent ma- 
trices and bold lowercase symbols (e.g. m) for vectors. The 
i-th entry of a vector m is denoted by rrii and Superscript 
T denotes matrix transpose. We use calligraphic capitalized 
symbols (e.g. T) for random variables. 

II. Data Dissemination in Wireless Networks 
Using Network Coding: A Review 

Network coding has received considerable attention in re- 
cent years for its potential for achieving the theoretical upper 
bound (max-fiow, min-cut) of network resource utilization via 
the introduction of coding concepts at the network layer ifTTIl . 
lfl2l . It has been shown that with simple random linear coding 
in place of the usual forwarding, system throughput can be 
increased in several canonical network topologies |[T3l . [14|. 
Although network coding was developed in the context of 
efficient multicasting 03), it has since been adapted to an 
increasing set of new applications |fl6l , ifTTl , Iffifl . Recently, 
random linear network coding within a gossip-based dissem- 
ination protocol was proposed in |fl9l for wired networks. 
They show that in a fully-connected network, information 

2 RVC links are half-duplex, i.e., the base station can not receive and 
transmit simultaneously. 



dissemination schemes based on network coding can provide 
substantial benefits. |20l| extends the result in |fl9ll by deriving 
(loose) upper bounds, for a wired network with arbitrary 
topology. 

Increased throughput resulting from use of network coding 
is not limited to wired networks. In ED . authors apply 
network coding to unicast flow in wireless networks. Their 
results show that significant gain (in terms of throughput) 
can be achieved even in the case of unicast, using simple 
XOR combining. Sagduyu et al. investigate the interaction 
between MAC and network coding, devising suitable conflict- 
free transmission schedules in wireless multi-hop network 
[22]. In ll23l . Yomo and Popovski propose distributed and 
opportunistic scheduling rules for combining packets in the 
presence of time- varying fading channel. 

Recently network coding has been applied to the data 
dissemination problem in wireless networks. In ll24l . authors 
use gossip based algorithm (called rumor-spreading) to diffuse 
information in an ad-hoc network. Their work is continued 
by ll25l who study the performance degradation due to actual 
MAC schemes. 

III. Roadside-to-Vehicle Phase:System Model 

Consider a platoon of N vehicles interested in a certain file 
comprising of M packets x±,X2, ■ ■ ■ xm- Suppose the network 
infrastructure (i.e., all the roadside base stations) possesses that 
common file. The goal is to distribute this file to all vehicles in 
the network. This is achieved by cycling through a succession 
of R2V+V2V phases. During each R2V phase, we assume 
that any of the N vehicles downloads a constant number of 
m <C M packets, i.e., the duration and rate of download 
per contact is constant and independent of the vehicle. For 
security assurance, we assume that communications between 
the BS and each vehicle is encrypted. In other words, the BS 
communicates with only one vehicle at a time and the other 
nodes can not see the communication link between them. That 
means, in collaborative downloading sharing is limited to V2V 
phase and we do not use the advantage of broadcasting in R2 V 
phase for the sake of security. 

Obviously, the way the m packets are chosen affects the 
system performance. We consider the two following ways of 
selecting m packets in each round: 

• Feedback-based scheme: The m packets are chosen ran- 
domly by the serving BS among the currently unreceived 
packets of the vehicle. It is assumed that in each round, 
nodes can individually signal to the server the specific 
indices of packets yet to be received. 

• Network coding aided scheme: In this scheme BS uses 
Random Linear Network Coding (RLNC). In each trans- 
mission, it sends a linear combination of the M packets, 
where the combining coefficients are uniformly chosen 
over the finite field F29 . In this scenario no feedback is 
needed from vehicles to the BS. 

When the last vehicle leaves the BS coverage area, all nodes 
are in V2V phase. In this phase, every node tries to share its 
information with other nodes inside the network. After V2V 




Fig. 2. In each transmission, the BS sends a linear combination of the 
M packets to a vehicle and this transmission can not be seen by any other 
vehicles. 



phase is complete, all vehicles have the same information. 
When the vehicles enter the range of the next BS, they try 
to obtain the remaining missing packets. This continues until 
every node has the full message of M packets. We call each 
R2V and its following V2V phase a round. The number of 
rounds required to send the M packets from the infrastructure 
to the vehicles is the matter of interest. In this paper, we aim 
to derive probability distribution and expected value of the 
number of rounds needed to disseminate information to all N 
vehicles in the network, using either the feedback-based or 
NC-aided scheme. 

For the sake of simplicity, we only consider the case where 
there are two vehicles in the network, i.e., N = 2. Generalizing 
our result to arbitrary iV is considered as one of our future 
works. For each scheme, we first analyze a simple case of 
m = 1, transmitting only one new packet in each R2V phase 
of a vehicle. Then, we generalize our result to arbitrary m. 

IV. Dissemination Without Network Coding 

Suppose there are N = 2 vehicles in the network. As 
mentioned before, in each R2V phase, a vehicle downloads m 
packets from the BS. In the following V2V phase, vehicles 
exchange their uncommon packets. In the i-th round, let 
Xi denote the number of common packets among the m 
downloaded packets by each node. Clearly, X. L is a random 
variable in the set {0,1,..., m}. Further, let define Si as 
number of packets each node has at the end of i-th round. 
Clearly, at the end of each round (R2V+V2V) 2m — Xi new 
packets are added to each node. Thus, It is easy to see the 
following recursive equation for Sf. 



Si — Si- 



2m - X. 



(1) 



The dissemination algorithm stops when both cars have all 
the M packets, i.e., the whole information. Let T denote 
stopping time defined as follows: 



In this section we aim to calculate probability distribution 
and expected value of the stopping time for the feedback-based 
dissemination algorithm. 

A. m = 1, N = 2 

lemma 1. The number of common packets at the i-th round 
given Si-i = s has a probability distribution given by 



P(Xi = x\Si-i = Si) 



Proof: see the appendix 
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As mentioned before, a vehicle, in each R2V phase, indi- 
vidually signal to the BS the specific indices of packet yet to 
be received by it. Thus, in each round number of packets each 
node has, would be increased by at least one when m = 1. 
Hence number of rounds needed to deliver all the M packets 
is at most M , i.e., T is bounded from above by M, On the 
other hand, at each round at most two new packets can be 
delivered to the vehicles. Therefore, stopping time is bounded 
from below by M/2. By the same argument, it is easy to see 
that for each < t < T, St is bounded as t < St < 2t. 

By definition of T, we have Sj- = N . We aim to calculate 
probability distribution of stopping time, i.e., P(T = t). If the 
stopping time is T, then at T — 1, one and only one of the 
followings is correct: 

. S T -i = M - 1 

. Sr-i = M — 2 

By conditioning on the above events and by using the law 
of total probability, P(T = t) can be calculated as follows: 



P(T = t) 



P(T = t\St-! - 
+P(T = t\S t -i 
P(S t = M|5t_i 
+P{S t = M\S t - 



M - l)P(S t -i = M - 1) (4) 
= M — 2)P(S t -i — M - 2) 
= M — l)P(S t -i = M - 1) 
i = M -2)P{S t -i =M-2). 



Given St-i = M — 1, probability of having St = M is the 
same as probability of X t = 1, which has been given in Eq. 
([T). Thus, by using Lemma Q] we have: 



P(S t = M\S t -i = M - 1) 



P(X t 
1, 



l\S t -i =M-1) 



T = min{5 f 
t 



M,t> 0}. 



(2) 



because if St-i — M — 1, it means that the vehicles are one 
packet short. In the next round, the BS will send them that 
packet to complete the downloading process. 

The above can be summarized to the following lemma: 

lemma 2. Let St be the number of packets each node has at 
the end of t-th round. Then 

P{S t = s + l\S t -i = s) = P(X t = l\S t -i = s) (5) 

1 



M 



and 



P{St = 8 + 2\St-i = 8) = P{X t = n\S t -i = s) (6) 

M-s-1 



M - s ' 



where (as mentioned before) 

t<S t < 2t. 

▲ 

To calculate P(T = i) in Eq. (0]), we need to calculate 
P(St = s). It can be calculated by conditioning on the two 
following collectively exclusive events: 

• St-i = s — 1 

• S t _ 2 = s - 2 
Therefor, we have 

P(S t = s) = P(St = a\St-i = a - l)P(«S*_i = s - 1) (7) 
+P(S t = s|5 t _! = s - 2)P(5t_i = s - 2) 
= P(Xt = l|5 t _! = s - = s - 1) 

+P(Af t = 0|S t _! = s - 2)P(5 t _! = a - 2) 

-P(5 t _x = s - 1) 



1 



M - s + 1 

M -8+1 

+ M - s + 2 



P(5 t _ x = s - 2). 



So, P(St — s) can be recursively calculated using above 
equation and the following initialization: 



P(5i = l) = P(#i = l) 
P(5i=2) = P(#i = 0) 



1 

M' 
M - 1 

M 



(8) 



We summarize the above findings to the following theorem: 

Theorem 1. Let T be the stopping time. Then, it has a 
probability distribution as follows: 

P(T = t) = P(St-! = M — 1) + \p{St-i = M — 2), (9) 

where 4f < t < M. P(S t = s) can be calculated using the 
following recursive formulation: 

1 



P(S t = s) 



M — 8 + 1 

M-s + 1 



P(St-l =s-l) (10) 



P(St-i =s-2) 



which has the following initialization expressions: 

1 



P(5i = 1) 



(11) 



P(Si = 2) 



M-l 
M ' 



B. N — 2 and arbitrary m 

The result in previous section can be readily generalized to 
the following. Its proof is omitted here for the sake of space 
limitation. 

Theorem 2. Let BS have M packets to distribute among two 
vehicles. In each R2V phase it transmits m packets to each 
car separately. Let T denote the stopping time. Then, it has 
the following probability distribution: 

2m I m \ 

P(T = t) = Y J ^frr L P{St-i=M-i), (12) 

i—l \m/ 

where 75— < t < —. Moreover, PiSt = s) can be calculated 
using the following recursive formula: 

i— 2m ( m \/M— s-\-i— m\ 
7-)/ o \ \ ^ \2m—i/ V i— m 

p(s t = s) = 2^ — 



M-sTi)- P(St-i = s-i), 

(13) 



which has initialization expressions as follows: 

I m \ (M—m\ 

P(Si = i) = b m -p'- m ' , i=m,...,2m. (14) 



Proof: see ||26 



V. Network Coding Aided Scheme 

Suppose BS is using linear random network coding. That 
means, in each transmission it sends a linear combination of 
the M packets, where combination coefficients are uniformly 
chosen in finite field F29. By the same procedure we did for 
the feedback-based algorithm, we first derive the probability 
distribution and expectation of stopping time for a very simple 
case of N = 2 and m = 1. Then, we solve the problem in a 
more general case of N = 2 and arbitrary m. 

A. m=l, N = 2 

The BS sends a linear random combination of its M packets 
in R2V phase. For example, in the first round, it sends the 
following packet to the first vehicle: 



M 



a l.iXi, Ciis G F 2 g, 



(15) 



where a^'s are the BS information packets, as defined in 
Section [Till The following combination is sent to the second 
vehicle: 



M 



(16) 



At the end of the first round (after one R2V+V2V), both 
vehicles have the same information, i.e., they both have 
packets in Eq. ([T5l > and Eq. ( TToT l. The above can be written 
as Ax where x = [11 12 • • ■ xm] t is an M x 1 vector of BS 
packets and A is a 2 x M matrix containing NC coefficients. 
Clearly, at the end of second round two rows are added to 



matrix A and Ax is a 4 x 1 vector representing 4 packets in 
each vehicle. In general at the end of each round, two rows 
are added to A. Therefore, at the end of round t, matrix A 
has dimension 2i x M. Clearly, nodes are able to recover BS 
information whenever matrix A is invertible or equivalently 
when rank(A) = M . Thus, we can define the stopping time 
when NC is applied, Tnc, as follows: 



T 



NC 



mm{rank(A 2t xM) 



M}. 



(17) 



Obviously Tnc > M/2. As mentioned before, our goal is 
to calculate the probability distribution and expected value of 
Tnc- The following lemma gives the probability of rank of 
matrix A tX M with random entries in a finite field. 

lemma 3. Let A txn be a random matrix over finite field ¥21 
such that each entry ajj is picked uniformly from F29. suppose 
t > n. Then, probability of A having rank n is given as 
follows: 



n 1 
P(rank(A txn ) = n) = JJ(1 - _ 



(t—n+i) - 



(18) 



>=i 
1 - 



1 



-(t-n+1) ' 



where the approximation is valid for sufficiently large q. 
Proof : See 



Using the above result, an upper-bound on the probability 
of the stopping time can be calculated as given in following 
lemma. Its proof is omitted here for the sake of space 
limitation. 

lemma 4. Let Tnc be stopping time defined in Eq. d!71 >. Then, 

P{T NC = t) < (l-~)(l-P(ranfe(A 2t _ lxM )=M))(19) 
1 1 

" (1 ~ gV (2t_M) ' 
where the last inequality is valid for large q. 

Proof: See 



Finally, the following theorem gives an upper-bound for the 
expected value of the stopping time when q is large enough. 

Theorem 3. Let Tnc be stopping time defined in ( 117b . 
Suppose q is large enough such that the approximation results 
in equations (1181 l and M9\ are valid. Then the expected value 
of stopping time is bounded from above with the following: 



r _ n M 1 
HTnc] < -7T 



2 q-1 



(20) 
▲ 



B. N — 2 and arbitrary m 

Results in previous section can be readily generalized to ar- 
bitrary m, number of packets transmitted from BS to vehicles 
in each R2V phase. Its proof is omitted here for the sake of 
space limitation. 

Theorem 4. Let Tnc be stopping time defined in ( 1171 ). 
Suppose q is large enough such that the approximation results 
in equations ( 1181 l and M9\ are valid. Then the expected value 
of stopping time is bounded from above with the following: 

M 1 



JE[T/v C ] < 



2m q — 1 



(21) 



VI. Evaluation Results 

In this section, we present results from simulations con- 
ducted using MATLAB R2008b. We conduct simulations to 
confirm the analytical results in Section [IV] and Section [V] 
describing data dissemination in R2V phase with and without 
network coding. 

Figure [3] depicts average number of rounds needed to 
disseminate information from infrastructure to the vehicles. 
As explained before, we are assuming complete information 
exchange in V2V phase. In the simulation, the BS contains M 
packets and in each round it sends m packets to each vehicle 
separately. As we mentioned before, in this paper we limit 
ourselves to N = 2 number of vehicles. 

As one can see, in Figure [3] T linearly increases with M, 
number of total information packets the BS possesses. There 
is very small benefit in using NC when the matric is number 
of rounds. In fact, for only two vehicles, the feedback-based 
algorithm needs almost one round more than NC aided scheme 
(note that NC almost achieves minimum number of rounds). 
Therefore, for dissemination delay, NC has small advantage 
compare to feedback-based algorithm. However, NC scheme 
removes any need for signaling. In NC aided scheme the BS 
keeps sending network coded data to the vehicles without any 
knowledge of what information they possess. On the other 
hand, for the feedback-based algorithm vehicles are required 
to communicate with BS before receiving any data. They need 
to inform the BS in what packets they are interested. 

VII. Conclusion 

In this paper, we consider amount of time needed (in 
R2V+V2V rounds) to distribute an object of M packets to 
two vehicles in a VANET network. We show that using NC 
has the advantage of removing any need for signaling. In other 
words, to transmit the data from infrastructure to the vehicles 
there is no need for the BS to know which packets the vehicles 
already possess. This advantage of using NC comes with no 
cost. In fact, the data dissemination latency is slightly better 
when NC is applied. However, our analysis here is limited to 
two vehicles. Generalizing the result to arbitrary number of 
vehicles in the network is one of our future research area. 
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Appendix A 
Proofs of Theorems 

Proof of Lemma [T] 

If we know Si-\ = Sj, it means AP has M — Si new packets 
to send to the vehicles. Having that, The denominator equals 
all possible m = 1 objects from M — Si chosen without 
replacement. The numerator represents all the possibilities 
such that there are exactly x common among the m-set at 
each of the nodes. 



Proof of lemma [4] 

For matrix A mxn , we define span{A mxn } as the space 
spanned by rows of matrix A mx „. By definition of stopping 
time T12V in Eq. (T% , the event of T = t is equivalent of 
having rank(A2tx m) = M. Now, let a 2t be the last row of 
matrix A.2ty.M then we have: 



P(Tj2V=t) = 



P(A 2txM = M) (22) 
= P{rank(A 2 t-ixM) = M - 1 
& a 2t £ spa7i{A 2t -ixAf}) 
= P(rank(A 2t -ixM) = M - 1) 
•-P(a 2t £ span{A 2t -ixM} 
\rank(A 2t -ixM) = M - 1). 

Given the fact that rank of A 2 t-ixM is M — l, the probability 
of adding a row which is not independent of all the other rows 
can be bounded from below as follows: 

P(a 2t S span{A 2t -ixM}\rank(A2t-ixM) = M - 1) 

> P(a 24 = 0\rank(A 2t -ixM) = M - 1) 
1 



Therefore, we have the following bound for having the last 
rowA 2tX Af being independent of the rest of the rows: 

-P(a 2 t £ span{A 2t -ixM} (23) 

\rank(A 2t ~ixM) = M — 1) 

= 1 - P(a 2t G span{A 2t -ixM} 

|ranfc(A 2t _ixM) = M - 1) 
1 

< 1 - -■ 

Q 

On the other hand: 

P(A 2t _ lxM = M - 1) < P(A 2t _ lxM <M — 1) (24) 
< (l-P(A 2t _ lxM =M)) 
1 



q -(2t-n) ' 



By substituting Eq. d23l and Eq. ( 122b in Eq. ( l22l the lemma 
is proven. 

■ 

proof of Theorem [3] 

M 

E[Tnc] = £ t • PCfoc = *) (25) 



t=4 
M 



< 



q> q -{2t-M) 

(i - V M E ^ 2t 



f M 

t— 2 

M 1 



2 3 - 1 



